Skip to content

Increase running pod memory limit for rapid_appends to prevent cgroup OOM#1282

Merged
yaozile123 merged 1 commit intoGoogleCloudPlatform:mainfrom
yaozile123:fix-rapid-appends-oom
Mar 30, 2026
Merged

Increase running pod memory limit for rapid_appends to prevent cgroup OOM#1282
yaozile123 merged 1 commit intoGoogleCloudPlatform:mainfrom
yaozile123:fix-rapid-appends-oom

Conversation

@yaozile123
Copy link
Copy Markdown
Collaborator

@yaozile123 yaozile123 commented Mar 27, 2026

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind failing-test

What this PR does / why we need it:
The rapid_appends GCSFuse integration tests fail with exit code 137 in file cache environments due to reaching the container's memory limit (Killed by cgroup OOM). This happens because the file cache suites default the volume-tester pod to a 1Gi limit, while compiling and executing the rapid_appends package can peak at around 1Gi of memory usage.

This PR updates configureLargeFileResources to ensure that rapid_appends tests bump the volume-tester pod to a standard 3Gi memory limit, preventing hard limit breaches.

image

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Tested on managaed driver with ZB enabled

make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true ENABLE_ZB=true E2E_TEST_FOCUS=rapid_appends

Ran 8 of 430 Specs in 1421.077 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 422 Skipped


Ginkgo ran 1 suite in 23m50.099545872s
Test Suite Passed

@google-oss-prow
Copy link
Copy Markdown

@yaozile123: The label(s) kind/failing-test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind failing-test

What this PR does / why we need it:
The rapid_appends GCSFuse integration tests fail with exit code 137 in file cache environments due to reaching the container's memory limit (Killed by cgroup OOM). This happens because the file cache suites default the volume-tester pod to a 1Gi limit, while compiling and executing the rapid_appends package can peak at around ~934MB of RSS.

This PR updates configureLargeFileResources to ensure that rapid_appends tests bump the volume-tester pod to a standard 3Gi memory limit, preventing hard limit breaches.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Tested on managaed driver with ZB enabled

make e2e-test E2E_TEST_USE_GKE_MANAGED_DRIVER=true ENABLE_ZB=true E2E_TEST_FOCUS=rapid_appends

Ran 8 of 430 Specs in 1421.077 seconds
SUCCESS! -- 8 Passed | 0 Failed | 0 Pending | 422 Skipped


Ginkgo ran 1 suite in 23m50.099545872s
Test Suite Passed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow
Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the configureLargeFileResources function in the GCS Fuse integration test suite to set resource requirements for the test pod when performing rapid append tests. Feedback was provided regarding a mismatch in the memory limit, suggesting it be adjusted to 3Gi for consistency with the sidecar container's configuration.

@yaozile123 yaozile123 marked this pull request as ready for review March 28, 2026 00:06
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: amacaskill, yaozile123

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yaozile123 yaozile123 merged commit 668ba34 into GoogleCloudPlatform:main Mar 30, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants